Geographic Information Retrieval: Classification, Disambiguation and Modelling
نویسنده
چکیده
This thesis aims to augment the Geographic Information Retrieval process with information extracted from world knowledge. This aim is approached from three directions: classifying world knowledge, disambiguating placenames and modelling users. Geographic information is becoming ubiquitous across the Internet, with a significant proportion of web documents and web searches containing geographic entities, and the proliferation of Internet enabled mobile devices. Traditional information retrieval treats these geographic entities in the same way as any other textual data. In this thesis I augment the retrieval process with geographic information, and show how methods built upon world knowledge outperform methods based on heuristic rules. The source of world knowledge used in this thesis is Wikipedia. Wikipedia has become a phenomenon of the Internet age and needs little introduction. As a linked corpus of semi-structured data, it is unsurpassed. Two approaches to mining information from Wikipedia are rigorously explored: initially I classify Wikipedia articles into broad categories; this is followed by much finer classification where Wikipedia articles are disambiguated as specific locations. The thesis concludes with the proposal of the Steinberg hypothesis: By analysing a range of wikipedias in different languages I demonstrate that a localised view of the world is ubiquitous and inherently part of human nature. All people perceive closer places as larger and more important than distant ones. The core contributions of this thesis are in the areas of extracting information from Wikipedia, supervised placename disambiguation, and providing a quantitative model for how people view the world. The findings clearly have a direct impact for applications such as geographically aware search engines, but in a broader context documents can be automatically annotated with machine readable meta-data and dialogue enhanced with a model of how people view the world. This will reduce ambiguity and confusion in dialogue between people or computers.
منابع مشابه
GIR Experiments with Forostar at GeoCLEF 2007
In this paper we describe our Geographic Information Retrieval experiments with Forostar, our GIR application on the GeoCLEF 2007 corpus and query set. We compare the results from orthogonal text with no geographic entities and only geographic entities with standard text retrieval and combined text and geographic relevance methods. The text and named entity analysis and retrieval methods of For...
متن کاملTratamiento de la dimensión espacial en el texto y su aplicación a la recuperación de información
This project is focused on toponym disambiguation and geographical focus identification in text. The goal is to improve the performance of geographic information retrieval systems. This paper describes the problems faced, working hypothesis, tasks proposed and goals currently achieved.
متن کاملSemantic Disambiguation of Thesaurus as a Mechanism to Facilitate Multilingual and Thematic Interoperability of Geographical Information Catalogues
Nowadays, there is a growing interest within the geographic information system community to find the location of geographic data through the Internet and to know the features and the possibilities of these data. In order to make this information easily accessible both to trained users and to the general public it is necessary to implement a specific infrastructure of geographical information pr...
متن کاملPlace Disambiguation with Co-occurrence Models
In this paper we describe the geographic information retrieval system developed and results achieved by the Multimedia & Information Systems team for GeoCLEF 2006. We detail our methods for generating and applying co-occurrence models for the purpose of place name disambiguation, our use of named entity recognition tools and text indexing applications. The presented system is split into two sta...
متن کاملExploring Image, Text and Geographic Evidences in ImageCLEF 2007
Abstract This year, ImageCLEF2007 data provided multiple evidences that can be explored in many different ways. In this paper we describe an information retrieval framework that combines image, text and geographic data. Text analysis implements the vector space model based on non-geographic terms. Geographic analysis implements a placename disambiguation method and placenames are indexed by the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008